#: Wird im R-Markdown-Text für Überschriften
verwendet. Mehrere Ebenen können durch Hinzufügen weiterer Rauten
erstellt werden.
**Wort**: Wort wird dick gedruckt
dargestellt.
*Wort*: Wort wird kursiv
dargestellt.
Sonderzeichen können durch einen Backslash (\)
“escapet” werden, damit sie als normale Zeichen erscheinen.
# install.packages("readxl")
# library(readxl)
lmm = read_excel("LMM.xlsx")
head(lmm) # Zeigt erste Zeilen des Datensatzes "lmm"
## # A tibble: 6 × 4
## ids department experience salary
## <dbl> <chr> <dbl> <dbl>
## 1 1 sociology 0 36095.
## 2 2 biology 1 55254.
## 3 3 english 1 59140.
## 4 4 informatics 8 78325.
## 5 5 statistics 1 74054.
## 6 6 sociology 3 46508.
summary(lmm) # Statistische Zusammenfassung des Datensatzes "lmm" einschließlich Minimum, Median, Mittelwert, Maximum und Quartile für numerische Variablen
## ids department experience salary
## Min. : 1.00 Length:100 Min. :0.00 Min. :36095
## 1st Qu.: 25.75 Class :character 1st Qu.:2.00 1st Qu.:52125
## Median : 50.50 Mode :character Median :4.00 Median :62229
## Mean : 50.50 Mean :4.39 Mean :64278
## 3rd Qu.: 75.25 3rd Qu.:7.00 3rd Qu.:78187
## Max. :100.00 Max. :9.00 Max. :90027
# View(lmm) # Öffnet tabellarische Ansicht des Datensatzes "lmm" im RStudio Viewer
str(lmm) # Zeigt Struktur des Datensatzes "lmm" einschließlich Variablentypen und ersten Beispieldaten)
## tibble [100 × 4] (S3: tbl_df/tbl/data.frame)
## $ ids : num [1:100] 1 2 3 4 5 6 7 8 9 10 ...
## $ department: chr [1:100] "sociology" "biology" "english" "informatics" ...
## $ experience: num [1:100] 0 1 1 8 1 3 8 2 3 2 ...
## $ salary : num [1:100] 36095 55254 59140 78325 74054 ...
names(lmm) # Gibt Namen der Variablen (Spalten) des Datensatzes "lmm" aus
## [1] "ids" "department" "experience" "salary"
unique(lmm$department) # Zeigt alle einzigartigen Werte der Variable "department"
## [1] "sociology" "biology" "english" "informatics" "statistics"
lapply(lmm[, c("department")], unique) # Funktion "lappy()": Wendet ausgewählte Funktion (z.B. "unique()") auf mehrere Variablen des Datensatzes an.
## $department
## [1] "sociology" "biology" "english" "informatics" "statistics"
\(\rightarrow\) Einfaches Modell zur Schätzung durchschnittlicher Gehaltsunterschiede zwischen Abteilungen mittels fester Effekte (Fixed Effects Model)
model1 = lm(salary ~ department, lmm)
summary(model1)
##
## Call:
## lm(formula = salary ~ department, data = lmm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11922.3 -2985.0 -52.7 3150.7 11138.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 51843 1127 45.983 < 0.0000000000000002 ***
## departmentenglish 10617 1594 6.659 0.00000000178 ***
## departmentinformatics 26080 1594 16.357 < 0.0000000000000002 ***
## departmentsociology -3826 1594 -2.399 0.0184 *
## departmentstatistics 29304 1594 18.379 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5042 on 95 degrees of freedom
## Multiple R-squared: 0.8809, Adjusted R-squared: 0.8759
## F-statistic: 175.6 on 4 and 95 DF, p-value: < 0.00000000000000022
qqnorm(resid(model1))
shapiro.test(resid(model1))
##
## Shapiro-Wilk normality test
##
## data: resid(model1)
## W = 0.99367, p-value = 0.925
hist(resid(model1))
# install.packages("lme4")
# library(lme4)
# install.packages("lmerTest")
# library(lmerTest)
model2 = lmer(salary ~ 1 + (1|department), lmm)
summary(model2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ 1 + (1 | department)
## Data: lmm
##
## REML criterion at convergence: 1994.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.38295 -0.58402 -0.00795 0.61588 2.22449
##
## Random effects:
## Groups Name Variance Std.Dev.
## department (Intercept) 221997992 14900
## Residual 25422058 5042
## Number of obs: 100, groups: department, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 64278 6682 4 9.619 0.000653 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model2)$coefficients # $coefficients um nur Fixed Effects zu zeigen
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 64277.74 6682.351 4 9.61903 0.0006530776
VarCorr(model2) # Rohstruktur der Varianz-Kovarianz Matrix der Random Effects, beinhaltet lediglich Standardabweichungen
## Groups Name Std.Dev.
## department (Intercept) 14900
## Residual 5042
as.data.frame(VarCorr(model2)) # Legt vollständige Berechnungen (Varianz UND Standardabweichung) offen
## grp var1 var2 vcov sdcor
## 1 department (Intercept) <NA> 221997992 14899.597
## 2 Residual <NA> <NA> 25422058 5042.029
confint(model2, level = 0.95) # Output-Erweiterung: Konfidenzintervalle
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 7911.713 29098.712
## .sigma 4401.953 5853.681
## (Intercept) 49907.902 78647.572
ICC_model2 = 221997992 / (221997992 + 25422058)
ICC_model2
## [1] 0.8972514
model3 = lmer(salary ~ experience + (1|department), lmm)
summary(model3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ experience + (1 | department)
## Data: lmm
##
## REML criterion at convergence: 1953.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.93140 -0.80670 -0.07917 0.80156 2.12467
##
## Random effects:
## Groups Name Variance Std.Dev.
## department (Intercept) 215356636 14675
## Residual 18880965 4345
## Number of obs: 100, groups: department, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 60469.557 6609.551 4.079 9.149 0.000723 ***
## experience 867.468 148.681 94.013 5.834 0.0000000762 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## experience -0.099
# REML durch ML ersetzen ...?
# model3 = lmer(salary ~ experience + (1|department), REML = FALSE, lmm)
# summary(model3)
summary(model3)$coefficients
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 60469.5567 6609.5511 4.07891 9.148814 0.00072262119073
## experience 867.4677 148.6809 94.01310 5.834426 0.00000007616036
confint(model3, level = 0.95)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 7805.4756 28645.859
## .sigma 3773.5921 5018.089
## (Intercept) 46288.9712 74656.162
## experience 575.0961 1160.768
sqrt(221997992)
## [1] 14899.6
# install.packages("ggplot2")
# library(ggplot2)
lmm$random.intercept.fixed.slope = predict(model3)
ggplot(lmm, aes(x = experience, y = salary, color = department)) +
geom_point() +
geom_line(aes(y = random.intercept.fixed.slope)) +
labs(x = "Experience", y = "Salary", color = "Department") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.line = element_line())
model4 = lmer(salary ~ experience + (experience|department), lmm)
summary(model4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ experience + (experience | department)
## Data: lmm
##
## REML criterion at convergence: 1941.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.76050 -0.84375 -0.01161 0.73272 2.32260
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## department (Intercept) 227233147 15074.3
## experience 469881 685.5 -0.26
## Residual 15518494 3939.4
## Number of obs: 100, groups: department, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 60149.368 6779.433 4.000 8.872 0.000892 ***
## experience 922.801 335.170 3.987 2.753 0.051390 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## experience -0.274
summary(model4)$coefficients
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 60149.3685 6779.4330 3.999646 8.872330 0.0008917901
## experience 922.8008 335.1701 3.986902 2.753231 0.0513899885
confint(model4, level = 0.95)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 7944.7287779 29496.9246374
## .sig02 -0.8478171 0.6389808
## .sig03 256.5642789 1430.1370444
## .sigma 3427.0528130 4592.9136408
## (Intercept) 45594.6492882 74748.1610306
## experience 206.2522390 1648.7374105
lmm$random.intercept.random.slope = predict(model4)
ggplot(lmm, aes(x = experience, y = salary, color = department)) +
geom_point() +
geom_line(aes(y = random.intercept.random.slope)) +
labs(x = "Experience", y = "Salary", color = "Department") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.line = element_line())
# Model 3: Random Intercept, Fixed Slope
# Model 4: Random Intercept, Random Slope
anova(model3, model4)
## refitting model(s) with ML (instead of REML)
## Data: lmm
## Models:
## model3: salary ~ experience + (1 | department)
## model4: salary ~ experience + (experience | department)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## model3 4 1992.2 2002.6 -992.10 1984.2
## model4 6 1986.5 2002.1 -987.23 1974.5 9.7504 2 0.007634 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Warum ist dieses Modell nicht sinnvoll?
model5 = lmer(salary ~ experience + (0 + experience | department), lmm)
summary(model5)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: salary ~ experience + (0 + experience | department)
## Data: lmm
##
## REML criterion at convergence: 2080.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.67330 -0.57808 -0.03245 0.60669 2.86364
##
## Random effects:
## Groups Name Variance Std.Dev.
## department experience 5309875 2304
## Residual 74068552 8606
## Number of obs: 100, groups: department, 5
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 59102.075 1546.912 94.013 38.206 <0.0000000000000002 ***
## experience 1065.242 1071.503 4.445 0.994 0.371
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## experience -0.227
lmm$fixed.intercept.random.slope = predict(model5)
ggplot(lmm, aes(x = experience, y = salary, color = department)) +
geom_point() +
geom_line(aes(y = fixed.intercept.random.slope)) +
labs(x = "Experience", y = "Salary", color = "Department") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.line = element_line())
# install.packages("readxl")
# library(readxl)
performance_initial = read_excel("LMM_2.xlsx")
names(performance_initial)
## [1] "subject" "trainer" "training"
## [4] "performance_time0" "performance_time1" "performance_time2"
## [7] "performance_time3" "performance_time4" "performance_time5"
## [10] "performance_time6" "performance_time7" "performance_time8"
## [13] "performance_time9" "performance_time10"
str(performance_initial)
## tibble [80 × 14] (S3: tbl_df/tbl/data.frame)
## $ subject : num [1:80] 1 2 3 4 5 6 7 8 9 10 ...
## $ trainer : num [1:80] 5 5 5 5 5 5 5 5 5 5 ...
## $ training : chr [1:80] "control" "control" "control" "control" ...
## $ performance_time0 : num [1:80] 38.9 32.9 35.3 39.3 36.8 ...
## $ performance_time1 : num [1:80] 37 36.4 36.9 41.7 34.3 ...
## $ performance_time2 : num [1:80] 37.9 31.7 37.1 41.4 35.5 ...
## $ performance_time3 : num [1:80] 37.6 34 36.4 41.1 33 ...
## $ performance_time4 : num [1:80] 38 33.4 36.5 40.3 36.6 ...
## $ performance_time5 : num [1:80] 37.2 32.6 34.9 41.1 38.3 ...
## $ performance_time6 : num [1:80] 37.1 33.2 38 40.4 35.3 ...
## $ performance_time7 : num [1:80] 37.3 35.5 39.6 39.1 33.5 ...
## $ performance_time8 : num [1:80] 34.5 34.8 40 36.2 33.7 ...
## $ performance_time9 : num [1:80] 37.8 32.3 37.3 39.3 35.3 ...
## $ performance_time10: num [1:80] 34.5 32.8 40.9 38.2 35.9 ...
lapply(performance_initial[, c("training")], unique) # EXTRAKTION EINDEUTIGER WERTE EINER AUSWAHL AN VARIABLEN (HIER: VARIABLE "training" AUS DATENSATZ "LMM_2")
## $training
## [1] "control" "training"
head(performance_initial)
## # A tibble: 6 × 14
## subject trainer training performance_time0 performance_time1 performance_time2
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1 5 control 38.9 37.0 37.9
## 2 2 5 control 32.9 36.4 31.7
## 3 3 5 control 35.3 36.9 37.1
## 4 4 5 control 39.3 41.7 41.4
## 5 5 5 control 36.8 34.3 35.5
## 6 6 5 control 43.6 42.3 43.5
## # ℹ 8 more variables: performance_time3 <dbl>, performance_time4 <dbl>,
## # performance_time5 <dbl>, performance_time6 <dbl>, performance_time7 <dbl>,
## # performance_time8 <dbl>, performance_time9 <dbl>, performance_time10 <dbl>
# WIDE-FORMAT >> LONG-FORMAT
# install.packages("tidyverse") # ÜBERTRAGEN IN SETUP-CHUNK
# library(tidyverse) #ÜBERTRAGEN IN SETUP-CHUNK
performance = pivot_longer(
data = performance_initial,
cols = c(performance_time0, performance_time1, performance_time2,
performance_time3, performance_time4, performance_time5,
performance_time6, performance_time7, performance_time8,
performance_time9, performance_time10),
names_to = "Time", # Der Name der ursprünglichen Spalten (z. B. "performance_time0") wird in eine neue Variable namens "Time" geschrieben.
names_prefix = "performance_time", # Der Präfix "performance_time" wird aus den Spaltennamen entfernt, sodass nur die Zeit (z. B. "0", "1", ...) übrig bleibt.
values_to = "Performance" # Die Werte in den ausgewählten Spalten werden in eine neue Spalte namens "Performance" geschrieben.
)
performance$Time = as.numeric(performance$Time)
names(performance)
## [1] "subject" "trainer" "training" "Time" "Performance"
head(performance)
## # A tibble: 6 × 5
## subject trainer training Time Performance
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 1 5 control 0 38.9
## 2 1 5 control 1 37.0
## 3 1 5 control 2 37.9
## 4 1 5 control 3 37.6
## 5 1 5 control 4 38.0
## 6 1 5 control 5 37.2
# install.packages("writexl")
# library(writexl)
writexl::write_xlsx(performance, "LMM_2_Long_Format.xlsx")
Die Daten haben eine Längsschnittstruktur, da für jede Person (subject) die Outcome-Variable (Performance, d.h. Sprintzeit) zu mehreren Zeitpunkten (Time) erfasst wurde.
\(\rightarrow\) Diese Struktur ermöglicht es uns, die individuelle Entwicklung der Sprintleistung im Zeitverlauf zu analysieren.
Zusätzlich enthalten die Daten zwei Gruppierungsvariablen:
Trainer
Trainingsstatus
Mögliche Fragestellungen: Wie verändert sich die Sprintzeit im Durchschnitt über die Zeit in Abhängigkeit von der Teilnahme an einem intensiven Intervalltraining?
Mögliche Hypothese: Athleten in der Trainingsgruppe zeigen im Durchschnitt eine stärkere Abnahme der Sprintzeit über die Zeit als Athleten in der Kontrollgruppe.
Annahme: Jeder Athlet eine individuelle Ausgangsleistung (Intercept, d.h. anfängliche Sprintzeit) hat, die um einen globalen Mittelwert streut, während die Wachstumsrate (Steigung, d.h. Verbesserung der Sprintzeit) für alle Athleten gleich bleibt.
\(\rightarrow\) Einfacher Einstieg, da es individuelle Unterschiede im Ausgangspunkt (Intercept) berücksichtigt, ohne die Komplexität zusätzlicher Prädiktoren wie Trainingsstatus oder Trainer oder deren Interaktion einzubeziehen. Es erlaubt eine erste Einschätzung der zeitlichen Entwicklung der Sprintzeiten und der Variabilität zwischen den Athleten.
model6 = lmer(Performance ~ Time + (1|subject), performance)
summary(model6)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Performance ~ Time + (1 | subject)
## Data: performance
##
## REML criterion at convergence: 3673.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1338 -0.6552 0.0057 0.6154 3.1002
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 6.683 2.585
## Residual 2.804 1.675
## Number of obs: 880, groups: subject, 80
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 37.47091 0.30771 94.10418 121.77 <0.0000000000000002 ***
## Time -0.27717 0.01785 799.00000 -15.53 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time -0.290
performance$Random.Intercept.Fixed.Slope = predict(model6, re.form = NA)
# Zusatzargument "re.form = NA" - GLOBALE STEIGUNG, keine individuelle STEIGUNG
ggplot(performance, aes(x = Time, y = Performance, group = subject)) +
geom_point(size = 0.1) +
geom_line(size = 0.1) +
geom_line(aes(x = Time, y = Random.Intercept.Fixed.Slope), color = "black", size = 0.8) +
labs(x = "Time", y = "Performance") +
theme_minimal() +
theme(
panel.grid = element_blank(),
axis.line = element_line(),
legend.position = "none"
)
model7 = lmer(Performance ~ Time + (Time|subject), performance)
summary(model7)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Performance ~ Time + (Time | subject)
## Data: performance
##
## REML criterion at convergence: 3559.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.15901 -0.64330 -0.02121 0.59911 2.98766
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 9.13058 3.0217
## Time 0.06458 0.2541 -0.52
## Residual 2.10207 1.4499
## Number of obs: 880, groups: subject, 80
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 37.47091 0.34999 78.99982 107.06 < 0.0000000000000002 ***
## Time -0.27717 0.03234 78.99935 -8.57 0.000000000000674 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Time -0.547
Das Modell berücksichtigt:
Random Intercept: Jede Person (subject) hat einen individuellen Ausgangswert (Intercept), der um den globalen Mittelwert streut.
Random Slope: Jede Person hat eine individuelle Wachstumsrate (Slope), die um die globale Steigung streut.
Training als fester Prädiktor: Das Modell testet, ob die Sprintzeit im Zeitverlauf durch die Teilnahme am Training beeinflusst wird. Die Variable Training hat zwei Ausprägungen: Training vs. Control
model8 = lmer(Performance ~ Time * training + (Time|subject), performance)
summary(model8)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Performance ~ Time * training + (Time | subject)
## Data: performance
##
## REML criterion at convergence: 3555.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.13461 -0.65993 0.00125 0.61249 2.96479
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 9.21415 3.0355
## Time 0.05937 0.2437 -0.53
## Residual 2.10208 1.4499
## Number of obs: 880, groups: subject, 80
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 37.26848 0.49707 78.00008 74.977 < 0.0000000000000002
## Time -0.19897 0.04430 78.00246 -4.492 0.0000241
## trainingtraining 0.40486 0.70296 78.00008 0.576 0.5663
## Time:trainingtraining -0.15639 0.06264 78.00246 -2.497 0.0146
##
## (Intercept) ***
## Time ***
## trainingtraining
## Time:trainingtraining *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Time trnngt
## Time -0.551
## tranngtrnng -0.707 0.390
## Tm:trnngtrn 0.390 -0.707 -0.551
performance$Random.Intercept.Random.Slope = predict(model8, re.form = NA)
ggplot(performance, aes(x = Time, y = Performance, group = subject)) +
geom_point(aes(color = training), size = 0.1) +
geom_line(aes(color = training), size = 0.1) +
geom_line(aes(x = Time, y = Random.Intercept.Random.Slope), color = "black", size = 1) +
stat_summary(fun = mean, aes(x = Time, y = Performance, group = training, color = training), geom = "line", linetype = "dashed", size = 1) +
labs(x = "Time", y = "Performance", color = "Group") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.line = element_line(), legend.position = "bottom"
)
Fragestellung: Auf wie viele Faktoren lassen sich die Zusammenhänge zwischen den gegebenen neun Skalen zurückführen?
nrow(EFA) # Anzahl der Fälle insgesamt
## [1] 299
# Geschlecht
table(EFA$SEX)
##
## 1 2
## 139 160
str(EFA$SEX)
## dbl+lbl [1:299] 2, 2, 2, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 2...
## @ label : chr "Geschlecht"
## @ format.spss : chr "F1.0"
## @ display_width: int 5
## @ labels : Named num [1:2] 1 2
## ..- attr(*, "names")= chr [1:2] "männlich" "weiblich"
EFA$SEX = as.factor(EFA$SEX)
levels(EFA$SEX) = c("männlich", "weiblich")
table(EFA$SEX)
##
## männlich weiblich
## 139 160
prop.table(table(EFA$SEX)) * 100
##
## männlich weiblich
## 46.48829 53.51171
round(prop.table(table(EFA$SEX)),3) * 100
##
## männlich weiblich
## 46.5 53.5
# Alter
min(EFA$AGE)
## [1] 15
max(EFA$AGE)
## [1] 81
# EFA = EFA[EFA$AGE >= 18 & EFA$AGE <= 65, ]
EFA = subset(EFA, select = -c(SEX, AGE))
names(EFA)
## [1] "Risikobereitschaft" "Impulsivität"
## [3] "Verantwortungslosigkeit" "Aktivität"
## [5] "Kontaktfreudigkeit" "Selbstbewusstsein"
## [7] "Minderwertigkeit" "Schwermut"
## [9] "Besorgtheit"
EFA = EFA %>%
rename(
V1 = Risikobereitschaft,
V2 = Impulsivität,
V3 = Verantwortungslosigkeit,
V4 = Aktivität,
V5 = Kontaktfreudigkeit,
V6 = Selbstbewusstsein,
V7 = Minderwertigkeit,
V8 = Schwermut,
V9 = Besorgtheit
)
names(EFA)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9"
cor(EFA)
## V1 V2 V3 V4 V5 V6
## V1 1.00000000 0.47351664 0.44323461 0.1711286 0.26041822 0.27754924
## V2 0.47351664 1.00000000 0.39993072 0.2024946 0.26048243 0.15869634
## V3 0.44323461 0.39993072 1.00000000 -0.3245046 0.07312967 0.07172468
## V4 0.17112859 0.20249461 -0.32450459 1.0000000 0.45186569 0.33243051
## V5 0.26041822 0.26048243 0.07312967 0.4518657 1.00000000 0.39587257
## V6 0.27754924 0.15869634 0.07172468 0.3324305 0.39587257 1.00000000
## V7 -0.19405554 0.06794166 0.08302704 -0.3622716 -0.40195207 -0.47485579
## V8 -0.07104343 0.13201378 0.19287894 -0.3474677 -0.31153103 -0.25263399
## V9 -0.17214201 0.20128015 0.11456815 -0.1665212 -0.24523910 -0.25264940
## V7 V8 V9
## V1 -0.19405554 -0.07104343 -0.1721420
## V2 0.06794166 0.13201378 0.2012802
## V3 0.08302704 0.19287894 0.1145681
## V4 -0.36227155 -0.34746771 -0.1665212
## V5 -0.40195207 -0.31153103 -0.2452391
## V6 -0.47485579 -0.25263399 -0.2526494
## V7 1.00000000 0.71410876 0.6765718
## V8 0.71410876 1.00000000 0.7026063
## V9 0.67657185 0.70260633 1.0000000
cor(EFA[, c("V1", "V2", "V3", "V4", "V5", "V6")])
## V1 V2 V3 V4 V5 V6
## V1 1.0000000 0.4735166 0.44323461 0.1711286 0.26041822 0.27754924
## V2 0.4735166 1.0000000 0.39993072 0.2024946 0.26048243 0.15869634
## V3 0.4432346 0.3999307 1.00000000 -0.3245046 0.07312967 0.07172468
## V4 0.1711286 0.2024946 -0.32450459 1.0000000 0.45186569 0.33243051
## V5 0.2604182 0.2604824 0.07312967 0.4518657 1.00000000 0.39587257
## V6 0.2775492 0.1586963 0.07172468 0.3324305 0.39587257 1.00000000
cor(EFA[, grep("V[1-9]",names(EFA))])
## V1 V2 V3 V4 V5 V6
## V1 1.00000000 0.47351664 0.44323461 0.1711286 0.26041822 0.27754924
## V2 0.47351664 1.00000000 0.39993072 0.2024946 0.26048243 0.15869634
## V3 0.44323461 0.39993072 1.00000000 -0.3245046 0.07312967 0.07172468
## V4 0.17112859 0.20249461 -0.32450459 1.0000000 0.45186569 0.33243051
## V5 0.26041822 0.26048243 0.07312967 0.4518657 1.00000000 0.39587257
## V6 0.27754924 0.15869634 0.07172468 0.3324305 0.39587257 1.00000000
## V7 -0.19405554 0.06794166 0.08302704 -0.3622716 -0.40195207 -0.47485579
## V8 -0.07104343 0.13201378 0.19287894 -0.3474677 -0.31153103 -0.25263399
## V9 -0.17214201 0.20128015 0.11456815 -0.1665212 -0.24523910 -0.25264940
## V7 V8 V9
## V1 -0.19405554 -0.07104343 -0.1721420
## V2 0.06794166 0.13201378 0.2012802
## V3 0.08302704 0.19287894 0.1145681
## V4 -0.36227155 -0.34746771 -0.1665212
## V5 -0.40195207 -0.31153103 -0.2452391
## V6 -0.47485579 -0.25263399 -0.2526494
## V7 1.00000000 0.71410876 0.6765718
## V8 0.71410876 1.00000000 0.7026063
## V9 0.67657185 0.70260633 1.0000000
cor(EFA[, grep("^V[1-9]$",names(EFA))])
## V1 V2 V3 V4 V5 V6
## V1 1.00000000 0.47351664 0.44323461 0.1711286 0.26041822 0.27754924
## V2 0.47351664 1.00000000 0.39993072 0.2024946 0.26048243 0.15869634
## V3 0.44323461 0.39993072 1.00000000 -0.3245046 0.07312967 0.07172468
## V4 0.17112859 0.20249461 -0.32450459 1.0000000 0.45186569 0.33243051
## V5 0.26041822 0.26048243 0.07312967 0.4518657 1.00000000 0.39587257
## V6 0.27754924 0.15869634 0.07172468 0.3324305 0.39587257 1.00000000
## V7 -0.19405554 0.06794166 0.08302704 -0.3622716 -0.40195207 -0.47485579
## V8 -0.07104343 0.13201378 0.19287894 -0.3474677 -0.31153103 -0.25263399
## V9 -0.17214201 0.20128015 0.11456815 -0.1665212 -0.24523910 -0.25264940
## V7 V8 V9
## V1 -0.19405554 -0.07104343 -0.1721420
## V2 0.06794166 0.13201378 0.2012802
## V3 0.08302704 0.19287894 0.1145681
## V4 -0.36227155 -0.34746771 -0.1665212
## V5 -0.40195207 -0.31153103 -0.2452391
## V6 -0.47485579 -0.25263399 -0.2526494
## V7 1.00000000 0.71410876 0.6765718
## V8 0.71410876 1.00000000 0.7026063
## V9 0.67657185 0.70260633 1.0000000
round(cor(EFA), 2)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## V1 1.00 0.47 0.44 0.17 0.26 0.28 -0.19 -0.07 -0.17
## V2 0.47 1.00 0.40 0.20 0.26 0.16 0.07 0.13 0.20
## V3 0.44 0.40 1.00 -0.32 0.07 0.07 0.08 0.19 0.11
## V4 0.17 0.20 -0.32 1.00 0.45 0.33 -0.36 -0.35 -0.17
## V5 0.26 0.26 0.07 0.45 1.00 0.40 -0.40 -0.31 -0.25
## V6 0.28 0.16 0.07 0.33 0.40 1.00 -0.47 -0.25 -0.25
## V7 -0.19 0.07 0.08 -0.36 -0.40 -0.47 1.00 0.71 0.68
## V8 -0.07 0.13 0.19 -0.35 -0.31 -0.25 0.71 1.00 0.70
## V9 -0.17 0.20 0.11 -0.17 -0.25 -0.25 0.68 0.70 1.00
# install.packages("corrplot")
# library(corrplot)
corrplot(cor(EFA), method = "color")
# colorbrewer2.org
corrplot(cor(EFA), method = "color", col = colorRampPalette(c("#c51b7d", "white", "#4d9221"))(200))
# install.packages("psych")
# library(psych)
# KMO Kriterium (Kaiser-Meyer-Olkin Kriterium)
KMO(EFA) # MSA = Measure of Sampling Adequacy
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = EFA)
## Overall MSA = 0.71
## MSA for each item =
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 0.65 0.66 0.51 0.59 0.83 0.80 0.79 0.76 0.73
# Bartlett-Test auf Sphärizität
# H0: Korrelationsmatrix ist Einheitsmatrix
cortest.bartlett(cor(EFA))
## $chisq
## [1] 343.2489
##
## $p.value
## [1] 0.0000000000000000000000000000000000000000000000000008835381
##
## $df
## [1] 36
# Zugang über EIGENWERTE
# KAISER KRITERIUM
## Nur Faktorenwerte mit einem Eigenwert > 1 werden extrahiert
eigen(cor(EFA))
## eigen() decomposition
## $values
## [1] 3.1929551 2.0686210 1.1907609 0.7097815 0.5906573 0.4437106 0.3270208
## [8] 0.2445874 0.2319054
##
## $vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.20180955 -0.48942333 0.20344713 0.26857384 -0.41978643 -0.57276092
## [2,] 0.05295321 -0.56382622 -0.19165245 0.30484118 -0.04053450 0.61663804
## [3,] -0.05434755 -0.50561927 0.48376815 -0.11450019 0.27195504 0.11761254
## [4,] 0.33517591 -0.02651339 -0.61858851 0.23814739 -0.20414333 -0.07730031
## [5,] 0.35670852 -0.20780049 -0.26760594 -0.09271811 0.77058157 -0.32603363
## [6,] 0.34529673 -0.17439342 -0.09410667 -0.80310917 -0.32503994 0.10879561
## [7,] -0.48333867 -0.11258204 -0.18216456 0.07951369 0.02199774 -0.18949111
## [8,] -0.44015227 -0.23338639 -0.18078765 -0.26969617 -0.07327041 -0.32769616
## [9,] -0.40908585 -0.21324391 -0.39857012 -0.18021110 0.03468271 0.10231488
## [,7] [,8] [,9]
## [1,] -0.04999693 -0.18684477 0.25402497
## [2,] -0.39873981 -0.04236226 -0.08344843
## [3,] 0.54950545 0.27218010 -0.17442216
## [4,] 0.49054791 0.28578525 -0.27861790
## [5,] -0.19628183 -0.05439266 0.08825754
## [6,] -0.12730749 0.20320571 0.14672936
## [7,] -0.19877833 0.72621308 0.33303696
## [8,] -0.18582194 -0.16396396 -0.68717485
## [9,] 0.40882395 -0.45718542 0.45608648
eigen(cor(EFA))$values
## [1] 3.1929551 2.0686210 1.1907609 0.7097815 0.5906573 0.4437106 0.3270208
## [8] 0.2445874 0.2319054
# SCREE Plot
psych::scree(cor(EFA))
# PARALLELANALYSE
fa.parallel(EFA, fm="ml")
## Parallel analysis suggests that the number of factors = 3 and the number of components = 3
fa(EFA, nfactors = 3, fm = "ml", rotate = "promax")$loadings
## Loading required namespace: GPArotation
##
## Loadings:
## ML1 ML3 ML2
## V1 -0.101 0.672
## V2 0.331 0.617 0.266
## V3 0.864 -0.494
## V4 -0.230 1.032
## V5 -0.189 0.209 0.395
## V6 -0.280 0.217 0.233
## V7 0.825
## V8 0.817 0.162
## V9 0.913 0.176
##
## ML1 ML3 ML2
## SS loadings 2.426 1.752 1.632
## Proportion Var 0.270 0.195 0.181
## Cumulative Var 0.270 0.464 0.646
# Promax: Oblique Rotationsmethode; Faktoren sind nicht unkorreliert, d.h. nicht unabhängig voneinander
fa(EFA, nfactors = 3, fm = "ml", rotate = "promax")$communalities
## V1 V2 V3 V4 V5 V6 V7 V8
## 0.5042159 0.5318089 0.6699846 0.8576953 0.3831535 0.2998385 0.7473298 0.7075320
## V9
## 0.7029236
fa_model = fa(EFA, nfactors = 3, fm = "ml", rotate = "promax")
factor_scores = factor.scores(EFA, fa_model)
EFA = cbind(EFA, factor_scores$scores)
names(EFA)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "ML1" "ML3" "ML2"
colnames(EFA)[colnames(EFA) == "ML1"] = "Faktor1"
colnames(EFA)[colnames(EFA) == "ML2"] = "Faktor2"
colnames(EFA)[colnames(EFA) == "ML3"] = "Faktor3"
names(EFA)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7"
## [8] "V8" "V9" "Faktor1" "Faktor3" "Faktor2"
Hypothese: Die Zusammenhänge zwischen den Subtests Arithmetische Kompetenz, Figural-Induktives Denken, Wortbedeutung und Langzeitgedächtnis können mit einem Generalfaktor der Intelligenz erklärt werden.
# install.packages("lavaan")
# library(lavaan)
# Schritt 1: Modell definieren
CFA_model = "Generalfaktor =~ V1 + V2 + V3 + V4"
# Schritt 2: CFA durchführen
fit = cfa(CFA_model, data = CFA)
# Schritt 3: Ergebnisse anzeigen
summary(fit)
## lavaan 0.6-18 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 196
##
## Model Test User Model:
##
## Test statistic 2.356
## Degrees of freedom 2
## P-value (Chi-square) 0.308
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## Generalfaktor =~
## V1 1.000
## V2 1.231 0.320 3.848 0.000
## V3 1.110 0.293 3.790 0.000
## V4 0.965 0.262 3.679 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .V1 1.029 0.125 8.220 0.000
## .V2 0.644 0.111 5.776 0.000
## .V3 0.830 0.114 7.262 0.000
## .V4 0.787 0.100 7.833 0.000
## Generalfaktor 0.259 0.105 2.472 0.013
# Erster Blick: Model Test User Model: Ergebnis n.s., d.h.: Die empirische Kovarianzmatrix entspricht in der Population der implizierten Kovariationsmatrix ("Model passt!")
summary(fit, fit.measures = TRUE)
## lavaan 0.6-18 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 196
##
## Model Test User Model:
##
## Test statistic 2.356
## Degrees of freedom 2
## P-value (Chi-square) 0.308
##
## Model Test Baseline Model:
##
## Test statistic 72.216
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.995
## Tucker-Lewis Index (TLI) 0.984
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1122.279
## Loglikelihood unrestricted model (H1) -1121.101
##
## Akaike (AIC) 2260.558
## Bayesian (BIC) 2286.783
## Sample-size adjusted Bayesian (SABIC) 2261.440
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.030
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.148
## P-value H_0: RMSEA <= 0.050 0.468
## P-value H_0: RMSEA >= 0.080 0.345
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.025
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## Generalfaktor =~
## V1 1.000
## V2 1.231 0.320 3.848 0.000
## V3 1.110 0.293 3.790 0.000
## V4 0.965 0.262 3.679 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .V1 1.029 0.125 8.220 0.000
## .V2 0.644 0.111 5.776 0.000
## .V3 0.830 0.114 7.262 0.000
## .V4 0.787 0.100 7.833 0.000
## Generalfaktor 0.259 0.105 2.472 0.013
# Indizes CFI, RMSEA und SRMR sind relevant
## CFI = Comparative Fit Index
## RMSEA = Root Mean Square Error of Approximation
## SRMR = Standardized Root Mean Square Residual
summary(fit, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-18 ended normally after 28 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 8
##
## Number of observations 196
##
## Model Test User Model:
##
## Test statistic 2.356
## Degrees of freedom 2
## P-value (Chi-square) 0.308
##
## Model Test Baseline Model:
##
## Test statistic 72.216
## Degrees of freedom 6
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.995
## Tucker-Lewis Index (TLI) 0.984
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1122.279
## Loglikelihood unrestricted model (H1) -1121.101
##
## Akaike (AIC) 2260.558
## Bayesian (BIC) 2286.783
## Sample-size adjusted Bayesian (SABIC) 2261.440
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.030
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.148
## P-value H_0: RMSEA <= 0.050 0.468
## P-value H_0: RMSEA >= 0.080 0.345
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.025
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Generalfaktor =~
## V1 1.000 0.509 0.449
## V2 1.231 0.320 3.848 0.000 0.627 0.616
## V3 1.110 0.293 3.790 0.000 0.565 0.527
## V4 0.965 0.262 3.679 0.000 0.491 0.484
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .V1 1.029 0.125 8.220 0.000 1.029 0.799
## .V2 0.644 0.111 5.776 0.000 0.644 0.621
## .V3 0.830 0.114 7.262 0.000 0.830 0.722
## .V4 0.787 0.100 7.833 0.000 0.787 0.765
## Generalfaktor 0.259 0.105 2.472 0.013 1.000 1.000
# install.packages("semPlot")
# library(semPlot)
semPaths(fit)
semPaths(fit, "std", edge.label.cex = 1, layout = "tree", whatLabels = "std")
names(CFA)
## [1] "sex" "age" "V1" "V2" "V3" "V4"
# str(CFA$sex)
CFA$sex = as.factor(CFA$sex)
# Drei Stufen der Messinvarianz: Konfigural, Metrisch, Skalare
# Konfigurale Messinvarianz
## "Passt das Modell in beiden Gruppen überhaupt, also sind dieselben Items überhaupt in denselben Faktoren enthalten?"
fit1 = cfa(CFA_model, data = CFA, group = "sex")
# Metrische Messinvarianz
## "Messen die Items den Faktor in beiden Gruppen gleich, also ist die Stärke mit der die Items auf einen Faktor laden, gleich?"
fit2 = cfa(CFA_model, data = CFA, group = "sex", group.equal = "loadings")
# Skalare Messinvarianz
## "Haben die Gruppen vergleichbare "Startpunkte" in der Messung?"
fit3 = cfa(CFA_model, data = CFA, group = "sex", group.equal = c("intercepts", "loadings"))
# Messinvarianzprüfung (Modellvergleich)
lavTestLRT(fit1, fit2, fit3)
##
## Chi-Squared Difference Test
##
## Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
## fit1 4 2271.6 2350.3 5.0780
## fit2 7 2267.6 2336.4 7.0245 1.9465 0.00000 3 0.58359
## fit3 10 2271.4 2330.4 16.8817 9.8571 0.15272 3 0.01982 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Es soll untersucht werden, ob sich Gruppen bzw. Typen an Personen identifizieren lassen, die sich hinsichtlich ihrer Giant-Three Persönlichkeitsprofile voneinander unterscheiden.
Big Three Model (Eysenck, 1994) Drei grundlegende Persönlichkeitseigenschaften: Extraversion, Neurotizismus und Psychotizismus (siehe Datensatz zur EFA)
CA$P = scale(CA$P)
CA$E = scale(CA$E)
CA$N = scale(CA$N)
head(CA)
## # A tibble: 6 × 3
## P[,1] E[,1] N[,1]
## <dbl> <dbl> <dbl>
## 1 0.101 -0.260 -0.160
## 2 -1.50 -0.203 0.511
## 3 0.272 1.45 -1.30
## 4 -1.67 -0.317 -0.318
## 5 -2.07 -0.317 -0.633
## 6 0.272 -0.545 1.22
# Berechnung der paarweisen Distanzen zwischen den Zeilen
d = dist(CA, method = "euclidean")
as.matrix(d)[1:4, 1:4]
## 1 2 3 4
## 1 0.000000 1.7360575 2.065952 1.7797259
## 2 1.736057 0.0000000 3.028084 0.8534846
## 3 2.065952 3.0280835 0.000000 2.8067161
## 4 1.779726 0.8534846 2.806716 0.0000000
Inkl. Linkage, typischerweise: Complete-Linkage-Methode (Maximum-Clustering) oder Ward-Methode
hc = hclust(d, method = "ward.D2")
# hc = hclust(d, method = "complete")
hc
##
## Call:
## hclust(d = d, method = "ward.D2")
##
## Cluster method : ward.D2
## Distance : euclidean
## Number of objects: 299
# install.packages("factoextra")
# library(factoextra)
fviz_dend(hc, cex = 0.2) + coord_flip()
Validierung: Wie gut spiegelt das Dendrogramm die tatsächlichen Daten wider?
\(\rightarrow\) Korrelation zwischen cophenetischer Distanz und den ursprünglichen Distanzen
coph = cophenetic(hc) # Cophenetische Distanzen des Dendrogramms werden berechnet
cor(d, coph)
## [1] 0.5399856
# hc = hclust(d, method = "complete") # Complete-Linkage (Maximum Distance)
# coph = cophenetic(hc)
#
# cor(coph, d)
#
#
#
# hc = hclust(d, method = "single") # Single-Linkage (Minimum Distance)
# coph = cophenetic(hc)
#
# cor (coph, d)
Nun: Zuweisung der Fälle zu Clustern (Tatsächliche Benennung der Cluster) \(\rightarrow\) Anzahl der Cluster k
Berechnung der Gesamt-Within-Cluster Summe der Quadrate (WSS) (misst, wie ähnlich die Datenpunkte innerhalb eines Clusters sind) für verschiedene k-Werte (k = Anzahl der Cluster)
Identifikation des “Knick” (Elbow), bei dem die WSS abflacht
elbow_plot = fviz_nbclust(x = CA, FUN = hcut, method = c("wss"))
elbow_plot + geom_vline(xintercept = 3, linetype = 2, color = "black")
# Manuelle Festlegung der Clusteranzahl durch "xintercept"
#
k_optimal = which.max(diff(diff(elbow_plot$data$y))) + 1
elbow_plot + geom_vline(xintercept = k_optimal, linetype = 2, color = "black")
Misst die Qualität eines Clusterings anhand der Kohäsion (d.h. des Zusammenhalts innerhalb eines Clusters) und der Separation (Trennung der Cluster).
Silhouetten-Koeffizient von -1 bis +1: Werte nahe +1 deuten auf gut getrennte Cluster hin
fviz_nbclust(x = CA, FUN = hcut, method = c("silhouette"))
\(\rightarrow\) k = 2 ist der ideale Wert!
# Dendrogramm nach k einfärben
fviz_dend(hc, cex = 0.2, k = 2, color_labels_by_k = TRUE) + coord_flip()
CA$cluster = cutree(hc, k = 2)
names(CA)
## [1] "P" "E" "N" "cluster"
# Deskriptivstatistiken der Cluster
aggregate(CA[, c("P", "E", "N")], by = list(Cluster = CA$cluster), FUN = mean)
## Cluster P E N
## 1 1 -0.06465722 0.3159969 -0.4086034
## 2 2 0.20017166 -0.9782917 1.2649914
aggregate(CA[, c("P", "E", "N")], by = list(Cluster = CA$cluster), FUN = sd)
## Cluster P E N
## 1 1 1.0333320 0.8833778 0.6945988
## 2 2 0.8650689 0.6453664 0.6961974
Cluster 1:
Psychotizismus: Leicht unterdurchschnittlich, also tendenziell wenig impulsiv aggressiv.
Extraversion: Eher überdurchschnittlich, also eher gesellig, energisch und gesprächig
Neurotizismus: Eher unterdurchschnittlich, also emotional stabil, ruhig und weniger anfällig für negative Emotionen.
Mitglieder dieses Clusters sind tendenziell emotional stabil und gesellig, mit leicht unterdurchschnittlicher Impulsivität.
Cluster 2: Psychotizismus: Überdurchschnittlich, also tendenziell impulsiver/aggressiver.
Extraversion: Stark unterdurchschnittlich, also eher introvertiert, weniger gesellig und zurückhaltend.
Neurotizismus: Stark überdurchschnittlich, also emotional instabil, anfällig für Stress und negative Emotionen.
Mitglieder dieses Clusters sind eher introvertiert und emotional instabil, mit einer Tendenz zu impulsivem Verhalten.
ggplot(CA, aes(x = as.factor(cluster), y = P)) + geom_boxplot() + labs(title = "Clusterunterschiede: Psychotizismus")
ggplot(CA, aes(x = as.factor(cluster), y = E)) + geom_boxplot() + labs(title = "Clusterunterschiede: Extraversion")
ggplot(CA, aes(x = as.factor(cluster), y = N)) + geom_boxplot() + labs(title = "Clusterunterschiede: Neurotizismus")
t.test(P ~ cluster, data = CA)
##
## Welch Two Sample t-test
##
## data: P by cluster
## t = -2.1641, df = 143.88, p-value = 0.03211
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -0.50671652 -0.02294124
## sample estimates:
## mean in group 1 mean in group 2
## -0.06465722 0.20017166
# # ANOVA
# mod_P = lm(P ~ cluster, data = CA)
# anova(mod_P)
#
# # install.packages("emmeans")
# library(emmeans)
#
# CA$cluster = as.factor(CA$cluster)
#
# emmeans(mod_P, pairwise ~ cluster, adjust = "bonferroni")
t.test(E ~ cluster, data = CA)
##
## Welch Two Sample t-test
##
## data: E by cluster
## t = 13.525, df = 166.06, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## 1.105344 1.483233
## sample estimates:
## mean in group 1 mean in group 2
## 0.3159969 -0.9782917
t.test(N ~ cluster, data = CA)
##
## Welch Two Sample t-test
##
## data: N by cluster
## t = -17.867, df = 121.72, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means between group 1 and group 2 is not equal to 0
## 95 percent confidence interval:
## -1.859032 -1.488158
## sample estimates:
## mean in group 1 mean in group 2
## -0.4086034 1.2649914
PoliticalDemocracy - Datensatz (Bollen, 1989) aus dem lavaan-Paket.
Latente Variablen (Faktoren), die im Modell spezifiziert werden sollen:
Demokratie im Jahr 1960 (dem60),
Demokratie im Jahr 1965 (dem65) und die
Industrielle Entwicklung im Jahr 1960 (ind60).
Diese werden durch manifeste Variablen (Indikatoren) gemessen:
dem60 wird durch die Indikatoren y1, y2, y3 und y4 repräsentiert,
dem65 wird durch die Indikatoren y5, y6, y7 und y8 repräsentiert,
ind60 wird durch die Indikatoren x1, x2 und x3 repräsentiert.
?PoliticalDemocracy
Hypothese: Die industrielle Entwicklung im Jahr 1960 (ind60) beeinflusst sowohl die Demokratie im Jahr 1960 (dem60) als auch die Demokratie im Jahr 1965 (dem65).
Es wird zudem angenommen, dass die Demokratie im Jahr 1960 (dem60) einen Einfluss auf die Demokratie im Jahr 1965 (dem65) hat.
Zwei Hauptkomponenten des SEMs:
Messmodell - Beschreibt die Beziehungen zwischen den latenten Variablen und ihren manifesten Indikatoren
Strukturmodell - Beschreibt die Beziehungen zwischen den latenten Variablen
Hinweis: In SEMs können die Residuen der Indikatoren miteinander korrelieren und müssen im Modell spezifiziert werden, um diesen zusätzlichen Zusammenhang im Modell zu berücksichtigen.
model = '
# Messmodell
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# Strukturmodell
dem65 ~ ind60 + dem60
dem60 ~ ind60
# Residualkorrelationen
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
fit = sem(model, data = PoliticalDemocracy)
summary(fit, fit.measures = TRUE, standardized = TRUE, rsquare = TRUE)
## lavaan 0.6-18 ended normally after 68 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 31
##
## Number of observations 75
##
## Model Test User Model:
##
## Test statistic 38.125
## Degrees of freedom 35
## P-value (Chi-square) 0.329
##
## Model Test Baseline Model:
##
## Test statistic 730.654
## Degrees of freedom 55
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.995
## Tucker-Lewis Index (TLI) 0.993
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1547.791
## Loglikelihood unrestricted model (H1) -1528.728
##
## Akaike (AIC) 3157.582
## Bayesian (BIC) 3229.424
## Sample-size adjusted Bayesian (SABIC) 3131.720
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.035
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.092
## P-value H_0: RMSEA <= 0.050 0.611
## P-value H_0: RMSEA >= 0.080 0.114
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.044
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ind60 =~
## x1 1.000 0.670 0.920
## x2 2.180 0.139 15.742 0.000 1.460 0.973
## x3 1.819 0.152 11.967 0.000 1.218 0.872
## dem60 =~
## y1 1.000 2.223 0.850
## y2 1.257 0.182 6.889 0.000 2.794 0.717
## y3 1.058 0.151 6.987 0.000 2.351 0.722
## y4 1.265 0.145 8.722 0.000 2.812 0.846
## dem65 =~
## y5 1.000 2.103 0.808
## y6 1.186 0.169 7.024 0.000 2.493 0.746
## y7 1.280 0.160 8.002 0.000 2.691 0.824
## y8 1.266 0.158 8.007 0.000 2.662 0.828
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dem65 ~
## ind60 0.572 0.221 2.586 0.010 0.182 0.182
## dem60 0.837 0.098 8.514 0.000 0.885 0.885
## dem60 ~
## ind60 1.483 0.399 3.715 0.000 0.447 0.447
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .y1 ~~
## .y5 0.624 0.358 1.741 0.082 0.624 0.296
## .y2 ~~
## .y4 1.313 0.702 1.871 0.061 1.313 0.273
## .y6 2.153 0.734 2.934 0.003 2.153 0.356
## .y3 ~~
## .y7 0.795 0.608 1.308 0.191 0.795 0.191
## .y4 ~~
## .y8 0.348 0.442 0.787 0.431 0.348 0.109
## .y6 ~~
## .y8 1.356 0.568 2.386 0.017 1.356 0.338
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .x1 0.082 0.019 4.184 0.000 0.082 0.154
## .x2 0.120 0.070 1.718 0.086 0.120 0.053
## .x3 0.467 0.090 5.177 0.000 0.467 0.239
## .y1 1.891 0.444 4.256 0.000 1.891 0.277
## .y2 7.373 1.374 5.366 0.000 7.373 0.486
## .y3 5.067 0.952 5.324 0.000 5.067 0.478
## .y4 3.148 0.739 4.261 0.000 3.148 0.285
## .y5 2.351 0.480 4.895 0.000 2.351 0.347
## .y6 4.954 0.914 5.419 0.000 4.954 0.443
## .y7 3.431 0.713 4.814 0.000 3.431 0.322
## .y8 3.254 0.695 4.685 0.000 3.254 0.315
## ind60 0.448 0.087 5.173 0.000 1.000 1.000
## .dem60 3.956 0.921 4.295 0.000 0.800 0.800
## .dem65 0.172 0.215 0.803 0.422 0.039 0.039
##
## R-Square:
## Estimate
## x1 0.846
## x2 0.947
## x3 0.761
## y1 0.723
## y2 0.514
## y3 0.522
## y4 0.715
## y5 0.653
## y6 0.557
## y7 0.678
## y8 0.685
## dem60 0.200
## dem65 0.961
semPaths(fit)
semPaths(fit,
edge.color = "black", # Pfade in schwarz
edge.label.cex = 0.8, # Größe der Labelschrift
edge.width = 0.5, # Einheitliche Liniendicke,
layout = "tree", # Layout (Alternativen: circle, spring)
sizeMan = 6, # Größe der manifesten Variablen
sizeLat = 8, # Größe der latenten Variablen
whatLabels = "std")
pairs(PoliticalDemocracy[, 1:4]) # dem60
# pairs(subset(PoliticalDemocracy, select = c("y1", "y2", "y3", "y4")))
pairs(PoliticalDemocracy[, 5:8]) # dem65
pairs(PoliticalDemocracy[, 9:11]) # ind60
# Multikollinearität?
## Determinante der Korrelationsmatrix
cor_1 = cor(PoliticalDemocracy[, 1:4])
det(cor_1)
## [1] 0.1197388
cor_2 = cor(PoliticalDemocracy[, 5:8])
det(cor_2)
## [1] 0.1016993
cor_3 = cor(PoliticalDemocracy[, 9:11])
det(cor_3)
## [1] 0.05381497
# Stichprobengröße
nrow(PoliticalDemocracy)
## [1] 75
Annahme: Es wird angenommen, dass der Zusammenhang zwischen Abhängigkeitskognitionen und der Ausprägung der Depressivität durch die negative Selbstbewertung vermittelt wird.
Hypothese im Sinne einer vollständigen Mediation
model <- ' # direct effect
BDI ~ c*ABK
# mediator
NSB ~ a*ABK
BDI ~ b*NSB
# indirect effect (a*b)
ab := a*b
# total effect
total := c + (a*b)
'
set.seed(123)
fit = sem(model, data = FIE, se = "bootstrap", bootstrap = 1000)
summary(fit, fit.measures = TRUE, standardized = TRUE)
## lavaan 0.6-18 ended normally after 1 iteration
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 5
##
## Number of observations 191
##
## Model Test User Model:
##
## Test statistic 0.000
## Degrees of freedom 0
##
## Model Test Baseline Model:
##
## Test statistic 176.345
## Degrees of freedom 3
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.000
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1353.544
## Loglikelihood unrestricted model (H1) -1353.544
##
## Akaike (AIC) 2717.088
## Bayesian (BIC) 2733.349
## Sample-size adjusted Bayesian (SABIC) 2717.511
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.000
## P-value H_0: RMSEA <= 0.050 NA
## P-value H_0: RMSEA >= 0.080 NA
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.000
##
## Parameter Estimates:
##
## Standard errors Bootstrap
## Number of requested bootstrap draws 1000
## Number of successful bootstrap draws 998
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## BDI ~
## ABK (c) 0.114 0.088 1.296 0.195 0.114 0.083
## NSB ~
## ABK (a) 0.507 0.082 6.161 0.000 0.507 0.420
## BDI ~
## NSB (b) 0.771 0.078 9.866 0.000 0.771 0.681
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .BDI 60.722 6.353 9.558 0.000 60.722 0.482
## .NSB 80.736 7.806 10.343 0.000 80.736 0.823
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## ab 0.391 0.074 5.288 0.000 0.391 0.286
## total 0.505 0.091 5.571 0.000 0.505 0.369
Der Zusammenhang zwischen Abhängigkeitskognitionen und der Ausprägung der Depressivität wird vollständig durch die negative Selbstbewertung vermittelt. Der direkte Effekte von Abhängigkeitskognitionen auf Depressivität ist nicht signifikant, während sowohl der Effekt von Abhängigkeitskognitionen auf die negative Selbstbewertung als auch der Effekt der negativen Selbstbewertung auf die Depressivität signifikant sind.
Praktische Implikationen: z.B. Therapeutische Ansätze: Interventionen sollten gezielt die negativen Selbstbewertungen ansprechen und verändern, da sie den entscheidenden Faktor darstellen.
Prävention: Menschen mit Abhängigkeitskognitionen könnten durch präventive Maßnahmen unterstützt werden, um negative Selbstbewertungen zu vermeiden und so das Risiko für Depressionen zu senken.